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Abstract. In this paper, we present a fast algorithm for constructing a concept 
(Galois) lattice of a binary relation, including computing all concepts and their 
lattice order. We also present two efficient variants of the algorithm, one for com- 
puting all concepts only, and one for constructing a frequent closed itemset lattice. 
The running time of our algorithms depends on the lattice structure and is faster 
than all other existing algorithms for these problems. 



1 Introduction 

Formal Concept Analysis (FCA) [14] has found many applications since its introduc- 
tion. As the size of datasets grows, such as data generated from high-throughput tech- 
nologies in bioinformatics, there is a need for efficient algorithms for constructing con- 
cept lattices. The input of FCA consists of a triple (0,M,2), called context, where O 
is a set of objects, A4 is a set of attributes, and X is a binary relation between O and 
M.. In FCA, the context is structured into a set of concepts. The set of all concepts, 
when ordered by set-inclusion, satisfies the properties of a complete lattice. The lattice 
of all concepts is called concept [24] or Galois [9] lattice. When the binary relation is 
represented as a bipartite graph, each concept corresponds to a maximal bipartite clique 
(or maximal biclique). There is also a one-one correspondence of a closed itemset [34] 
studied in data mining and a concept in FCA. The one-one correspondence of all these 
terminologies - concepts in FCA, maximal bipartite cliques in theoretical computer sci- 
ence (TCS), and closed itemsets in data mining (DM) - was known, e.g. [3, 34]. There 
is extensive work of the related problems in these three communities, e.g. [2]-[8] in 
TCS, [10]-[23] in FCA, and [25]-[36] in DM. In general, in TCS, the research focuses 
on efficiently enumerating all maximal bipartite cliques (of a bipartite graph); in FCA, 
one is interested in the lattice structure of all concepts; in DM, one is often interested in 
computing frequent closed itemsets only. 



Time complexity. Given a bipartite graph, it is not difficult to see that there can be ex- 
ponentially many maximal bipartite cliques. For problems with potentially exponential 
(in the size of the input) size output, in their seminal paper [6], Johnson et al introduced 
several notions of polynomial time for algorithms for these problems: polynomial total 
time, incremental polynomial time, polynomial delay time. An algorithm runs in poly- 
nomial total time if the time is bounded by a polynomial in the size of the input and 
the size of the output. An algorithm runs in incremental polynomial time if the time 
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required to generate a successive output is bounded by the size of input and the size 
of output generated thus far. An algorithm runs in polynomial delay time if the genera- 
tion of each output is only polynomial in the size of input. It is not difficult to see that 
polynomial delay is stronger than incremental polynomial (namely an algorithm with 
polynomial delay time is also running in incremental polynomial), which is stronger 
than polynomial total time, polynomial delay algorithm, we can further distinguish if 
the space used is polynomial or exponential in the input size. 

Previous work. Observe that the maximal bipartite clique (MBC) problem is a special 
case of the maximal clique problem in a general graph. Namely, given a bipartite graph 
G = (Vi, V2, E), a maximal bipartite clique corresponds to a maximal clique in G = 
(Vi UV 2 ,E) where E = E U (Vi x Vi) U (V 2 x V 2 ). Consequently, any algorithm for 
enumerating all maximal cliques in a general graph, e.g., [8,6], also solves the MBC 
problem. In fact, the best known algorithm in enumerating all maximal bipartite cliques, 
which was proposed by Makino and Uno [7] that takes 0(A 2 ) polynomial delay time 
where A is the maximum degree of G, was based on this approach. The fact that the set 
of maximal bipartite cliques constitutes a lattice was not observed in the paper and thus 
the property was not utilized for the enumeration algorithm. 

In FCA, much of research has been devoted to study the properties of the lattice 
structure. There are several algorithms, e.g. [19,23, 18], that construct the lattice, i.e. 
computing all concepts together with its lattice order. There are also some algorithms 
that compute only concepts, e.g. [21, 14]. (We remark that the idea of using a total 
lectical order on concepts Ganter's algorithm [ 14] is also used in [6, 7] for enumerating 
maximal (bi)cliques.) See [16] for a comparison studies of these algorithms. The best 
polynomial total time algorithm was by Nourine and Raynaud [19] with 0(nm\B\) 
time and 0(n\B\) space, where n = \0\ and m = and B denote the set of all 
concepts. This algorithm can be easily modified to run in 0(mn) incremental time 
[20]. Observe that the space of total size of all concepts is needed if one is to keep the 
entire structure explicitly. There were several other algorithms, e.g. [14, 18], all run in 
0(n 2 m) polynomial delay. There is another algorithm [23] that is based on divide-and- 
conquer approach, but the analytical running time of the algorithm is unknown as it is 
difficult to analyze. 

There are several algorithms in data mining for computing frequent closed itemsets, 
such as CHARM(-L) [35,36], and CLOSET(+) [29,32]. To our best knowledge, the 
algorithm with theoretical analysis running time was given in [3] with 0(m 2 n) incre- 
mental polynomial running time, where n = \0\ and m = \M\. 

Our Results. In this paper, by making use of the lattice structure of concepts, we present 
a simple and fast algorithm for computing all concepts together with its lattice order. 
The main idea of the algorithm is that given a concept, when all of its successors are 
considered together (i.e. in a batch manner), they can be efficiently computed. We com- 
pute concepts in the Breadth First Search (BFS) order - the ordering given by BFS 
traversal of the lattice. When computing the concepts in this way, not only do we com- 
pute all concepts but also we identify all successors of each concept. Another idea of 
the algorithm is that we make use of the concepts generated to dynamically update 
the adjacency relations. The running time of our algorithm is 0(^ aeext ( C ^ |cnbr(a)|) 
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polynomial delay for each concept C (see Section 2 for related background and termi- 
nology), where cnbr(a) is the reduced adjacency list of a. Our algorithm is faster than 
the best known algorithms for constructing a lattice because the algorithm is faster than 
a basic algorithm that runs in 0(X!aeext(c) nbr(a)|), where |nbr(a)| is number of at- 
tributes adjacent to the object a, and this basic algorithm is already as fast as the current 
best algorithms for the problem. 

We also present two variants of the algorithm: one is computing all concepts only 
and another is constructing the frequent closed itemset lattice. Both algorithms are faster 
than the current start-of-the-art program for these problems. 

Outline. The paper is organized as follows. In Section 2, we review some background 
and notation on FCA. In Section 3, we describe some basic properties of concepts that 
we use in our lattice-construction algorithm. In Section 4, we first describe the high level 
idea of our algorithm. Then we describe how to efficiently implement the algorithm. In 
Section 5, we describe two variants of the algorithm. One is for computing all concepts 
only and another is for constructing a frequent closed itemset lattice. We conclude with 
discussion in Section 6. 

2 Background and Terminology on FCA 

In FCA, a triple (O, M,T) is called a context, where O — {51, 32, ■ ■ • , 9n} is a set of n 
elements, called objects; A4 = {1,2,..., to} is a set of to elements, called attributes; 
and 1 C O x M is a binary relation. The context is often represented by a cross-table 
as shown in Figure 1 . A set X C O is called an object set, and a set J C M is called 
an attribute set. Following the convention, we write an object set {a, c, e} as ace, and 
an attribute set {1, 3, 4} as 134. 

For i £ M, denote the adjacency list of i by nbr(z) = {g £ O : (g, i) £ X}. 
Similarly, for g £ O, denote the adjacency list of g by nbr(g) = {i £ M : (g, i) £ J}. 

Definition 1. The function attr : 2° — ► 2 M maps a set of objects to their common 
attributes: attr(X) = C\ ge x'nbr{g), for X C O. The function obj : 2 M — ► 2° maps 
a set of attributes to their common objects: obj(J) = (~)j e jnbr(j), for J C M. 

It is easy to check that for X C O, X C obj(attr(X)), and for J C M, J C 
attr(obj(J)). 

Definition 2. An object set X C O is closed if X = obj(attr(X)). An attribute set 
J £ Mis closed if J = attr(obj( J)). 

The composition of obj and attr induces a Galois connection between 2° and 2 M . 
Readers are referred to [14] for properties of the Galois connection. 

Definition 3. A pair C = (A, B), with A C O and B C M, is called a concept if 
A = attr(B) and B = obj (A). 

For a concept C = (A, B), by definition, both A and B are closed. The object set 
A is called the extent of C, written as A = ext(C), and the attribute set B is called the 
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intent of C, and written as B = int(C). The set of all concepts of the context (O, M.,1) 
is denoted by B(0, A4,I) or simply B when the context is understood. 

Let (Ai, Bi) and (A 2 , £? 2 ) be two concepts in B. Observe that if A\ C A2, then 
B2Q B\. We order the concepts in B by the following relation -<: 

(Ai.Bi) -< (A 2 ,B 2 ) ^iiC A 2 (B 2 C Si). 

It is not difficult to see that the relation -< is a partial order on B. In fact, C =< B,~<> 
is a complete lattice and it is known as the concept or Galois lattice of the context 
(O, M,T). For C, D e B with C < D, if for all E e B such that C < E ~< D implies 
that E = C or E — D, then C is called the successor '(or /ower neighbor) of D, 
and D is called the predecessor (or upper neighbor) of C . The diagram representing 
an ordered set (where only successors/predecessors are connected by edges) is called a 
Hasse diagram (or a line diagram). See Figure 1 for an example of the line diagram of 
a Galois lattice. 

For a concept C = (ext(C), int(C)), ext(C) = obj(int(C)) and int(C) = 
attr(ext(C)). Thus, C is uniquely determined by either its extent, ext(C), or by its in- 
tent, int(C). We denote the concepts restricted to the objects O by Bo = {ext(C) : C E 
B}, and the attributes M by Bm = {\nt(C) : C 6 B}. For A <E Bo, the corresponding 
concept is (A, attr(A)). For J e Bm, the corresponding concept is (obj(J), J). The 
order -< is completely determined by the inclusion order on 2° or equivalently by the 
reverse inclusion order on 2 M . That is, C =< B, ^> and Cm =< Bm, ^> are order- 
isomorphic. We have the property that (obj(Z), Z) is a successor of (obj(X), X) in £ if 
and only if Z is a successor of X in . Since the set of all concepts is finite, the lattice 
order relation is completely determined by the covering (successor/predecessor) rela- 
tion. Thus, to construct the lattice, it is sufficient to compute all concepts and identify 
all successors of each concept. 

3 Basic Properties 

In this section, we describe some basic properties of the concepts on which our lattice 
construction algorithms are based. 

Proposition 1. Let C be a concept in B(0,M,T). For i G M \ int(C), if E t = 
ext(C) (~1 nbr(i) is not empty, Ei is closed. Consequently, (Ei,attr(Ei)) is a concept. 

Proof. For i e A4 \ int(C), suppose that = ext(C) n nbr(i) is not empty. We 
will show that obj(attr(_Ej)) = E it Since E^ C obj(attr(^)), it remains to show that 
obj(attr(£; i )) C E^ By definition, obj(int(C) U {i}) = (n ieint(c) nbr(j)) n nbr(«) = 
ext(C) n nbr(i) = E t . Thus, (int(C) U {i}) C attr(obj(int(C) U {i})) = attr(^). 
Consequently, obj(attr(^)) C obj(int(C) U {i}) = Ei. □ 

Example. Consider the concept C — (abed, 0) of context in Figure 1, we have E\ = 

abc, Ei = bd, E3 — ac, E4 = bd. 

1 Some authors called this as immediate successor. 
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Fig. 1. (a) A context (O, M.,1) with O = {a, b, c, d} and M = {1, 2, 3, 4}. The cross x indi- 
cates a pair in the relation 1. (b) The corresponding Galois/concept lattice, (c) Child(afecd, 0) = 
{{abc, 1), (bd, 24), (ac, 3)}; Child(a6c, 1) = {(ac, 13), (6, 124)}. 



3.1 Defining the equivalence classes 

For a closed attribute set X C A4, denote the set of remaining attributes {i G M \ X : 
ob](X) n nbr(i) ^ 0} by res(X). Consider the following equivalence relation ~ on 
res(X): i ~ j ^> obj(X) n nbr(V) = obj(X) n nbr(j), fori ^ j e res(X). 

Let Si , . . . , St be the equivalence classes induced by ~, i.e. res(X) = Si U ... U St, 
and ob](X) n nbr(i) = obj(X) n nbr(j) for any i ^ j 6 S fc , 1 < fc < t. We denote 
the set {Si, . . . , SJ by AttrChild(X). We call Sj the sibling of S 4 for j ^ i. For 
convenience, we will write X U S, by XS^. When there is no confusion, we abuse the 
notation by writing X U AttrChild(X) = {XS : S G AttrChild(X)}. Note that by 
definition, obj(XSfc) = obj(X) n obj(Sfc) = obj(X) n nbr(i) for some i G Sfc. We 
denote the pairs {(obj(XSi), XSi), . . . , (obj(XSi), XSi)} by Child(obj(X), X). 

Recall that C =< B,~<> and Cm —< &m, 3> are order-isomorphic. We have 
the property that (obj(Y), Y) is a successor of (obj(X), X) in £ if and only if Y is a 
successor of X in Cm- For each S G AttrChild(X), we call XS a child of X and X a 
parent of XS. By the definition of the equivalence class, for each Z that is a successor 
of X, there exists a S 6 AttrChild(X) such that Z = XS. That is, if Z is a successor 
of X, Z is a child of X. 

Let Succ(X) denote all the successors of X, then we have Succ(X) C X U 
AttrChild(X). However, not every child of X is a successor of X. For the exam- 
ple in Figure 1, AttrChild(0) = {1,24,3}, where 1 and 24 are successors of 
but 3 is not. Succ(0) = {1,24} c AttrChild(0); while AttrChild(l) = {24,3}, 
Succ(a) = {124,13} = 1 U AttrChild(l). Similarly, if P is a predecessor of X, then P 
is parent of X but it is not necessary that every parent of X is a predecessor of X. 

Note that for S G AttrChild(X), if XS G Succ(X), then by definition XS is 
closed. It is easy to check that the converse is also true. Namely, if XS is closed, then 
XS G Succ(X). In other words, we have the following proposition. 



Proposition 2. Succ(X) = {XS : XS is closed, S G AttrChild(X)}. 
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3.2 Characterizations of Closure 

By definition, an attribute set X is closed if obj(attr(X)) = X. In the following we 
give two characterizations for an attribute set being closed based on its relationship with 
its siblings. 

Proposition3. For S G AttrChild(X), XS is not closed if and only if there exists 
T G AttrChild(X), T ^ S, such that obj(XS) C obj(XT). Furthermore, for all 
T G AttrChildpO with ob](XS) C obj(XT), there exists S' G AttrChild(XT) such 
that S C S', obj(XS) = ob](XTS') andXS C XTS'. 

Proof. If XS is not closed, by definition, there exists i 6 res(X) \ S such that 
i G attr(obj(X5)). As AttrChild(X) is a partition of res(X), there exists a T G 
AttrChild(X) such that i e T, and thus ob\(XT) = ob]{X) n nbr(i) D obj(XS). 

Conversely, suppose there exists T G AttrChild(X) such that ob](XS) C 
ob\(XT). Then attr(obj(X5)) D XTS. That is, XS C XTS C attr(obj(X5)), 
which implies XS is not closed. 

Suppose that ob](XS) C obj(XT) with T e AttrChild(X). For i e S, ob]{XT) n 
nbr(z) = obj(XT) nobj(X) n nbr(i) = ob\(XT) n obj(XS) = objpfS). Thus, there 
exists 5' G AttrChild(XT) such that S C S', obj(XS) = obj(XTS'). Since X, S,T 
are disjoint, XS C XTS* C XTS'. □ 

Based on the first part of this proposition (first characterization), we can test if XS 
is closed, for S G AttrChild(X), by using subset testing of its object set against its 
siblings' object set. Namely, XS is closed if and only ob](XS) is not a proper subset 
of its siblings' object set. In our running example in Figure 1, 3 is not closed because its 
object set obj(3) = ac is a proper subset of the object set of its sibling, obj(l) = abc. 

In general, subset testing operations are expensive. We, however, can make use of 
the second part of the proposition (second characterization) for testing closure using 
set exact matching operations instead of subset testing operations. This is because if 
we process the children in the decreasing order of their object-set size, we can test the 
closure of XS by comparing its size against the size of the attribute set (if exists) of 
obj(XS). Namely, we first search if ob](XS) exists by a set exact matching operation. 
If it does not, then XS is closed. Otherwise, if the size of the existing attribute set of 
obj(XS) is greater than \XS\, then XS is not closed. In our running example, 3 is not 
closed because obj(3) = ac has a larger attribute set 13. 

4 Algorithm: Constructing a Concept/Galois Lattice 

In this section, we first describe the algorithm in general terms, independent of the im- 
plementation details. We then show how the algorithm can be implemented efficiently. 

4.1 High Level Idea 

Recall that constructing a concept lattice includes generating all concepts and identify- 
ing each concept's successors. 
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Our algorithm starts with the top concept (O, attr(O)). We process the concept by 
computing all its successors, and then recursively process each successor by either the 
Depth First Search (DFS) order — the ordering obtained by DFS traversal of the lattice 
— or Breadth First Search (BFS) order. According to Proposition 2, successors of a con- 
cept can be computed from its children. Let C = (ob](X), X) be a concept. First, we 
compute all the children Child(C) = {(ob](XS), XS) : S £ AttrChild(X)}. Then for 
each S G AttrChild(X), we check if XS is closed. If XS is closed, (ob](XS), XS) is 
a successor of C. Since a concept can have several predecessors, it can be generated sev- 
eral times. We check its existence to make sure that each concept is processed once and 
only once. The pseudo-code of the algorithm based on BFS is shown in Algorithm 1 . 



Algorithm 1 Concept-Lattice Construction - BFS 

1: Compute the top concept C = (O, attr(O)); 

2: Initialize a queue Q = {C}; 

3: Compute Child(C); 

4: while Q is not empty do 

5: C = dequeue(Q); 

Let X = int(C) and suppose AttrChild(X) =< Si,S 2 , - ■ ■ , S k >; 

6: for i = 1 to k do 

7: if XSi is closed then 

Denote the concept (obj(XSi), XSi) by K; 

8: if K does not exist then 

9: Compute Child(7T); 

10: Enqueue K to Q; 

11: end if 

12: Identify AT as a successor of C; 

13: end if 

14: end for 
15: end while 



4.2 Implementation 

The efficiency of the algorithm depends on the efficient implementation of processing a 
concept that include three procedures: (1) computing Child(); (2)testing if an attribute 
set is closed; (3) testing if a concept already exists. 

First, we describe how to compute Child(obj(X), X) in O(^ agob j,- X ) |nbr(a)|) 
time, using a procedure, called SPROUT, described in the following lemma. 

Lemma 1. For (ob](X) 7 X) e B, it takes 0(J2 a eob)(x) | nbr(a) |) to compute 
Child(obj(X),X). 

Proof. Let res(X) = U aeo bj(j! S :)nbr(a) \ X. For each i £ res(X), we associate it with 
a set Ei (which is initialized as an empty set). For each object a e obj(X), we scan 
through each attribute i in its neighbor list nbr(a), append a to the set Ei. This step 
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takes 0(J2aeobj{X) | nbr(a) |). Next we collect all the sets {E l : i E res(X)}. We 
use a trie to group the same object set: search Ei in the trie; if not found, insert Ei 
into the trie with {i} as its attribute set, otherwise we append i to Ei's existing at- 
tribute set. This step takes 0(J2 t&es (x) \ E i\) = °(Eaeobj(X) l nbr («)D- Thus , this 
procedure, called SPROUT(obj(X), X), takes O(^ a£o bj(x) l n br(a)|) tmie to compute 
Child(obj(X),X). ' □ 

For S E AttrChild(X), we test if XS is closed based on the second char- 
acterization in Proposition 3. For this method to work, it requires processing the 
children Child(obj(X), X) in the decreasing order of their object-set size. Suppose 
AttrChildpO = {Si, ...,S k } where |obj(XSi)| > |objpfS 2 )| > . . . > \ob}(XS k )\. 
We process Si-i before Si. If XSi-i is closed, we also compute its children 
Child(obj(X5i_i), XSi-i). Now to test if XSi is closed, we check if ob](XSi) ex- 
ists. If it does not, then XSi is closed. Otherwise, we compare \XSi\ against the size 
of the existing attribute set of ob](X Si). If \XSi\ is not smaller, then XSi is closed 
otherwise it is not. To efficiently search obj(XSi), we use a trie (with hashing over 
each node) to store the object sets of concepts generated so far and it takes linear time 
to search and insert (if not exists) an object set. That is, it will take O (\ob](X Si) \ ) time 
to check if XSi is closed. The total time it takes to check if all children are closed is 
0(Etilobj(X5 l )|). 

Recall that a concept C = (obj(X), X) is uniquely determined by its extent ob]{X) 
or its intent X. Therefore, we can store either the object sets or the attribute sets gener- 
ated so far in a trie, and then test the existence of C by testing the existence of obj(X) 
or X. Since searching the object sets are needed in testing the closure of an attribute set 
as described above, the cost of testing the existence ob](X) comes for free. 

Note that J2 a eobi(x) l nbr ( a )l > Xj=i |obj(X5i)| • \Si\. Hence, the time it takes to 
process a concept is dominated by the procedure SPROUT, in 0(J2 a eob)(X) l n ' 3r ( a )l) 
time. If we can reduce the sizes of the adjacency lists (|nbr()|), we can reduce the 
running time of the algorithm. Note that this basic algorithm is already as fast as any 
existing algorithm for constructing a concept lattice (or computing all concepts only 
that takes 0(A 2 ) time where A is the maximum size of adjacency lists). 

In the following we describe how to dynamically update the adjacency lists that will 
reduce the sizes of adjacent lists, and thus improve the running time of the algorithm. 



Further Improvement: Dynamically Update Adjacency Lists. Consider a con- 
cept C = {ob\{X),X), the object sets of all descendants of C are all subsets of 
ob](X). To compute the descendants of C, it suffices to consider the objects with 
restriction to ob](X). For S E AttrChild(X), by definition, all attributes in S have 
the same adjacency lists when restricting to ob](X). That is, for all i ^ j E S, 
nbr(i) n obj(X) = nbr(j) n ob}(X)(= ob](XS)). In other words, for all a E obj(X), 
i E nbr(a) <^ j E nbr(a), for all i, j E S, i.e., the adjacent list of a either contains 
all elements in S or no element in S. Therefore, we can reduce the sizes of adjacent 
lists of objects by representing all attributes in S by a single element. For example in 
Figure example2, we can use a single element 16 to represent the two attributes 1 and 6, 
and 35 to represent 3 and 5. In doing so, we reduce the size of adjacency list of b from 
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5 elements {1, 3, 4, 5, 6} to three elements {16, 35, 4}. We call the reduced adjacency 
lists the condensed adjacency lists. Denoted the condensed adjacent list by cnbr(). The 
set of condensed adjacency lists corresponds to a reduced cross-table. For example, the 
reduced cross table of CW\\d(abcde, 0) of the above example is shown in Figure 2. 




(abc, 16) (bd, 35) (de, 2) 




(a) (b) (C) 

Fig. 2. (a) A context, (b) The corresponding concept lattice, (c) Reduced cross-table of 
Child (abcde, 0) of the context. 

In order to use the condensed adjacency lists in procedure SPROUT, we need to 
process our concepts in BFS order and it requires one extra level, i.e. in a two-level 
manner. More specifically, for a concept C = (ob](X),X), we first compute all its 
children Child(C). Then we dynamically update the adjacency lists by representing the 
attributes in each child of C with one single element. We then use these condensed adja- 
cency lists to process each child of C. That is, instead of using the global adjacency lists, 
when processing (obj(XS'), XS), we use the condensed adjacency lists of its parent. 
It takes 0(X^seAttrChiid(x) \°ty(XS)\) for C to generate its condensed adjacency lists 
cnbrQ (see Algorithm 3 in the Appendix for the pseudo-code). And the time for the 
procedure SPROUT is 0(X} a eobj(x) |cn br(a.) | ) (see Algorithm 2 in the Appendix for 
the pseudo-code). Notice that £ a eobj(x) |cnbr(a)| > EseAttrChiid(x) \ oh i( x S)\, the 
time for updating the adjacency lists is subsumed by the time required for procedure 
SPROUT. Therefore, our new running time is O(j2 a eobj(x) |cn br(a.) | ) for each con- 
cept (ob](X), X). See Algorithm 4 for the pseudo-code and Figure 3 for a step-by-step 
illustration of the algorithm. 

5 Variants of The Algorithm 

For some applications, one is not interested in the entire concept lattice. In the follow- 
ing, we will describe how to modify our algorithm to solve two special cases: enumer- 
ating all concepts only and constructing a frequent closed itemset lattice. 
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5.1 Algorithm 2: Computing All Concepts or Maximal Bipartite Cliques 

If one is interested in computing all the concepts and not in their lattice order, as in 
enumerating all maximal bicliques studied in [7]. We can easily modify our algorithm 
to give an even faster algorithm for this purpose. This is because in our algorithm, each 
concept is generated many times, more precisely, at least number of its predecessors 
times. For example in Figure 3, (d, 235) is generated twice, one by each of its predeces- 
sor. However, when we need all concepts only, we do not need regenerate the concepts 
again and again. This can be easily accomplished by considering the right siblings only 
in the procedure SPROUT, i.e. changing the line 3 to for i G nbr(a) AND i > s do, 
while the other parts of the algorithm remain the same. Depending on the lattice struc- 
ture, this can significantly speed up the algorithm as the number of siblings is decreasing 
in a cascading fashion. A more careful analysis is needed for the running time of this 
algorithm. 

5.2 Algorithm 3: Constructing a Closed Itemset Lattice 

In data mining, one is interested in large concepts, i.e. (obj(X), X) where |obj(X)| is 
larger than a threshold. Although our algorithm can naturally be modified to construct 
such a closed itemset lattice: we stop processing a concept when the size of its object set 
is less than the given threshold, where objects correspond to transactions and attributes 
correspond to items. Theoretically, when the memory requirement is not a concern, our 
algorithm is faster than all other existing algorithms (including the state-of-art program 
CHARM-L) for constructing such a frequent closed itemset lattice. However, in prac- 
tice, for large data sets (as those studied in data mining), the data structure - a trie on 
objects (transactions) - requires huge memory and this may threaten the algorithm's 
practical efficiency. However, it is not difficult to modify our algorithm so that a trie 
on attributes (items) instead is used. Recall that a trie on objects are required in two 
steps of our algorithm: testing the closure of an attribute set and testing the existence 
of a concept. As noted above, the existence of a concept can also be tested on its in- 
tent (i.e. attributes), thus we can use a trie on attributes for testing the existence of a 
concept. To avoid using a trie on objects for testing the closure of an attribute set, we 
can use the first characterization in Proposition 3 instead, that is, we test the closure of 
an attribute set by using subset testing of its object set against its siblings' object set, 
as described in Section 3. Further, we can employ the practically efficient technique 
diffset as in CHARM(-L) for both our SPROUT procedure and subset testing operations. 
We are testing the performance of the diffset based implementation on the available 
benchmarks and the results will be reported elsewhere. 

6 Discussion 

Our interest in FCA stems from our research in microarray data analysis [1]. We have 
implemented an not yet optimized version of our algorithm (with less than 500 effective 
lines in C++). The program is very efficient for our applications, in which our data 
consists of about 10000 objects and 29 attributes. It took less than 1 second for the 
program to produce the concept lattice (about 530 vertices/concepts and 1500 edges) 
in a Pentium IV 3.0GHz computer with 2G memory running under Fedora 2 linux OS. 
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The program is available upon request at this point and will release to the public in the 
near future. 

As FCA finds more and more applications, especially in bioinformatics, efficient 
algorithms for constructing concept/Galois lattices are much needed. Our algorithm is 
faster than the existing algorithms for this problem, nevertheless, it seems to have much 
room to improve. 
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Appendix 



Algorithm 2 Sprout 

Input: s, content and nbr 

(obj(X), X) is the sth child of G. Let K = {1, . . . , k} be all the children of G. 

Output: Child(obj(X),X) = {(obj(XSi), 15.) : 1 < i < t} 
1: For each i G K, set Cj = 0. 
2: for a G C do 
3: for i G nbr(a) \ {s} do 
4: Append a to Ci; 

5: end for 
6: end for 

The following takes 0(J2 zeK |C*|) = 0(E aeC l nbr («)l) time. 
7: Initialize a local trie Tc over objects; 
8: for i E K do 

9: if d does not exist in Tc then 
10: Insert C t into T c ; 

11: Si — content(i); 

12: else 

13: Merge Si with content(i) ; 

14: end if 
15: end for 

16: Output all the pairs in T c : {{ob]{XSj), XSj) :l<j<t}. 



Algorithm 3 CondenseAdjacentlists 

Input: Child(C) = {(obj(XSi), IS,) : 1 < i < t] 

Output: content(i) for i = 1 . . . t, and new adjacency lists, nbr(a), a G obj(X) 
1: For each a £ obj(X), nbr(a) = 0; 
2: for i = 1 to t do 
3: content(i) = S^; 

4: for each a G obj(XS'i), append i to nbr(a); 
5: end for 
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Algorithm 4 Concept-Lattice Construction - 2-level BFS 

1: Compute the top concept C = (0, attr(O)); 

2: Initialize a queue Q = {C}; 

3: Initialize a trie T for the object set 0; 

4: content(i) = {i} fori G A4; 

5: Child(C) = SPROUT(0, content, nbr); 

6: while Q is not empty do 

7: C = dequeue(Q); 

8: Sort the pairs in Child(C) according to its extent size in decreasing order: 

(obi(XSi),XSi),l<i<k. 
9: (content, nbr) = CONDENSEADJACENTLISTS(Child(C)); 
10: for i = 1 to k do 
11: Search obj {X Si) in T; 

Denote (ob](X Si), X Si) by K; x> K is not necessary a concept. 

12: if obj(XSi) does not exist then 

13: Insert oh](XSi) into T, and associate it with the attribute set XSi, 

14: Identify K as the successor of C; 

15: Child(-fsT) = SPROUT(i, content, nbr); 

16: Enqueue K into Q; 

17: else if the attribute set associate with ob](XSi) is not greater than XSi then 

18: Identify K as the successor of C; 

19: end if 

20: end for 

21: end while 
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(l)Sprout(afecde, 0) 




(abc, 16) (be, 4) (bd, 35) (de, 2) 
(3)Eliminate (be, 4) as it is not closed 

(abedej 

(abcri6) (bx^f) (bd, 35) (de, 2) 

(be, 146) (b, 1356) 
(5)Sprout(de, 2) 

(aMeJZ!) 

(abc7l6) (^4) (bd, 35) (de, 2) 

(bcfl46)^(M356) 




(b, 13456) (d, 235) ( e . 27 ) 

(7)Eliminate (b, 1356) as it is not closed 

(jtocfeg) 

(abc; 16) (hSM) (bd, 35) (de, 2) 

(be, 146) (bOS56) 




(2)Sprout(a6c, 16) 

^^j^c^e^^ 

(abc, 16) (be, 4) (bd, 35) (de, 2) 

(be, 146) (b, 1356) 
(4)Sprout(6d,35) 

(abcdejZ!) 
(abc^ 16) (bj£4) (bd, 35) 
(be, 146) (b, 1356) 

(b, 13456) (d,'235) 



(6)Sprout(fec, 146) 

(abcde,S2) 




(de, 2) 




(abc, 16) (bXty (bd. 35) (de, 2) 

(bcfl46)^(M3S6) 




(b, 13456) (d, 235) ( e . 27 ) 

(8)Sprout(6, 13456), (d, 235), (e, 27) 

(abcfc0) 

(abcri6) (taeM) (bd. 35) (de, 2) 

(be, 146) (bOJ56) 

(bTl3456) (d,^235) ( e '. 27 ) 






(b, 13456) (d, 235) ( e . 27 ) 



(0, 1234567) 



Fig. 3. Step by step illustration of the 2-level BFS lattice construction algorithm. The context and 
the corresponding lattice are shown in Figure 2. 



(abed, 0) 




bc,l) (bd,24) 



